short description
Reward Model Perspectives: Whose Opinions Do Reward Models Reward?
Reward models (RMs) are central to the alignment of language models (LMs). An RM often serves as a proxy for human preferences to guide downstream LM behavior. However, our understanding of RM behavior is limited. Our work (i) formalizes a framework for measuring the alignment of opinions captured by RMs, (ii) investigates the extent to which RMs demonstrate sociodemographic biases, and (iii) explores the effects of prompting to steer rewards towards the preferences of a target group. We study the subjective and diverse perspectives on controversial topics, which allows us to quantify RM perspectives in terms of their opinions, attitudes, and values. We show that RMs are poorly aligned with several demographic groups and can systematically reward harmful stereotypes, and steering alone is not enough to overcome these limitations. Our findings underscore the need for more careful consideration of RM behavior in model alignment during preference learning to prevent the propagation of unwanted social biases in the language technologies that we use.
- South America > Ecuador (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- (21 more...)
- Health & Medicine > Therapeutic Area (0.93)
- Government (0.68)
KG-TRICK: Unifying Textual and Relational Information Completion of Knowledge for Multilingual Knowledge Graphs
Zhou, Zelin, Conia, Simone, Lee, Daniel, Li, Min, Huang, Shenglei, Minhas, Umar Farooq, Potdar, Saloni, Xiao, Henry, Li, Yunyao
Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph Enhancement (KGE), the task of predicting missing textual information for entities. Although previous efforts have considered KGC and KGE as independent tasks, we hypothesize that they are interdependent and mutually beneficial. To this end, we introduce KG-TRICK, a novel sequence-to-sequence framework that unifies the tasks of textual and relational information completion for multilingual KGs. KG-TRICK demonstrates that: i) it is possible to unify the tasks of KGC and KGE into a single framework, and ii) combining textual information from multiple languages is beneficial to improve the completeness of a KG. As part of our contributions, we also introduce WikiKGE10++, the largest manually-curated benchmark for textual information completion of KGs, which features over 25,000 entities across 10 diverse languages.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (7 more...)
iPhone users say Apple's iOS 18.2 update is RUINING their battery life - here's what to do if your device is affected
Apple Intelligence is essentially a snazzy brand name for Apple's new-found focus on AI, triggered by the huge success of the ChatGPT. Here's a look at some of the best features of Apple Intelligence, which comes to the UK via the new iOS 18.2 operating system. Surely the biggest part of Apple Intelligence is the integration of OpenAI's hugely popular chatbot ChatGPT with Siri, Apple's in-built virtual assistant. With better'language-understanding capabilities' enabled by ChatGPT, Siri will help you across multiple apps and'accelerate everyday tasks', Apple said. You'll be able to press and hold the side button to activate Siri as normal, but with ChatGPT behind it Siri will be able to'answer thousands of questions about how to do something' that it couldn't before.
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Chen, Xupeng, Lai, Zhixin, Ruan, Kangrui, Chen, Shichu, Liu, Jiaxiang, Liu, Zuozhu
Artificial intelligence has made significant strides in medical visual question answering (Med-VQA), yet prevalent studies often interpret images holistically, overlooking the visual regions of interest that may contain crucial information, potentially aligning with a doctor's prior knowledge that can be incorporated with minimal annotations (e.g., bounding boxes). To address this gap, this paper introduces R-LLaVA, designed to enhance biomedical VQA understanding by integrating simple medical annotations as prior knowledge directly into the image space through CLIP. These annotated visual regions of interest are then fed into the LLaVA model during training, aiming to enrich the model's understanding of biomedical queries. Experimental evaluation on four standard Med-VQA datasets demonstrates R-LLaVA's superiority over existing state-of-the-art (SoTA) methods. Additionally, to verify the model's capability in visual comprehension, a novel multiple-choice medical visual understanding dataset is introduced, confirming the positive impact of focusing on visual regions of interest in advancing biomedical VQA understanding.
- North America > United States > New York (0.04)
- Europe > Switzerland (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Moonshine: Distilling Game Content Generators into Steerable Generative Models
Nie, Yuhe, Middleton, Michael, Merino, Tim, Kanagaraja, Nidhushan, Kumar, Ashutosh, Zhuang, Zhan, Togelius, Julian
Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.
- North America > United States > New York (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Argument-Aware Approach To Event Linking
Hsu, I-Hung, Xue, Zihan, Pochh, Nilay, Bansal, Sahil, Natarajan, Premkumar, Srinivasa, Jayanth, Peng, Nanyun
Event linking connects event mentions in text with relevant nodes in a knowledge base (KB). Prior research in event linking has mainly borrowed methods from entity linking, overlooking the distinct features of events. Compared to the extensively explored entity linking task, events have more complex structures and can be more effectively distinguished by examining their associated arguments. Moreover, the information-rich nature of events leads to the scarcity of event KBs. This emphasizes the need for event linking models to identify and classify event mentions not in the KB as ``out-of-KB,'' an area that has received limited attention. In this work, we tackle these challenges by introducing an argument-aware approach. First, we improve event linking models by augmenting input text with tagged event argument information, facilitating the recognition of key information about event mentions. Subsequently, to help the model handle ``out-of-KB'' scenarios, we synthesize out-of-KB training examples from in-KB instances through controlled manipulation of event arguments. Our experiment across two test datasets showed significant enhancements in both in-KB and out-of-KB scenarios, with a notable 22% improvement in out-of-KB evaluations.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (15 more...)
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos
Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific customization. To facilitate instruction-modality fine-tuning, we collect high-quality instruction tuning data in an automatic and scalable manner, composed of 24K QA samples for audio and 250K QA samples for 3D. Leveraging instruction-aware representations, our model performs comparably with leading-edge counterparts without the need of extensive modality-specific pre-training or customization. Furthermore, our approach demonstrates cross-modal reasoning abilities across two or more input modalities, despite each modality projection being trained individually. To study the model's cross-modal abilities, we contribute a novel Discriminative Cross-modal Reasoning (DisCRn) evaluation task, comprising 9K audio-video QA samples and 28K image-3D QA samples that require the model to reason discriminatively across disparate input modalities.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (7 more...)
"Kurosawa": A Script Writer's Assistant
Gandhi, Prerak, Pramanik, Vishal, Bhattacharyya, Pushpak
Storytelling is the lifeline of the entertainment industry -- movies, TV shows, and stand-up comedies, all need stories. A good and gripping script is the lifeline of storytelling and demands creativity and resource investment. Good scriptwriters are rare to find and often work under severe time pressure. Consequently, entertainment media are actively looking for automation. In this paper, we present an AI-based script-writing workbench called KUROSAWA which addresses the tasks of plot generation and script generation. Plot generation aims to generate a coherent and creative plot (600-800 words) given a prompt (15-40 words). Script generation, on the other hand, generates a scene (200-500 words) in a screenplay format from a brief description (15-40 words). Kurosawa needs data to train. We use a 4-act structure of storytelling to annotate the plot dataset manually. We create a dataset of 1000 manually annotated plots and their corresponding prompts/storylines and a gold-standard dataset of 1000 scenes with four main elements -- scene headings, action lines, dialogues, and character names -- tagged individually. We fine-tune GPT-3 with the above datasets to generate plots and scenes. These plots and scenes are first evaluated and then used by the scriptwriters of a large and famous media platform ErosNow. We release the annotated datasets and the models trained on these datasets as a working benchmark for automatic movie plot and script generation.
- North America > United States > New York (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Instant videos could represent the next leap in AI technology - Toysmatrix
Ian Sansavera, a software architect at a New York startup called Runway AI, typed a short description of what he wanted to see in a video. "A tranquil river in the forest," he wrote. Less than two minutes later, an experimental internet service generated a short video of a tranquil river in a forest. The river's running water glistened in the sun as it cut between trees and ferns, turned a corner and splashed gently over rocks. Runway, which plans to open its service to a small group of testers this week, is one of several companies building artificial intelligence technology that will soon let people generate videos simply by typing several words into a box on a computer screen.
- North America > United States > New York (0.25)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Massachusetts (0.05)
- (3 more...)